This paper addresses the challenge of 3D full-body human pose estimation froma monocular image sequence. Here, two cases are considered: (i) the imagelocations of the human joints are provided and (ii) the image locations ofjoints are unknown. In the former case, a novel approach is introduced thatintegrates a sparsity-driven 3D geometric prior and temporal smoothness. In thelatter case, the former case is extended by treating the image locations of thejoints as latent variables. A deep fully convolutional network is trained topredict the uncertainty maps of the 2D joint locations. The 3D pose estimatesare realized via an Expectation-Maximization algorithm over the entiresequence, where it is shown that the 2D joint location uncertainties can beconveniently marginalized out during inference. Empirical evaluation on theHuman3.6M dataset shows that the proposed approaches achieve greater 3D poseestimation accuracy over state-of-the-art baselines. Further, the proposedapproach outperforms a publicly available 2D pose estimation baseline on thechallenging PennAction dataset.
展开▼